Deprivation: “a state of exclusion from the ordinary customs and activities of society.”
The Annual Deprivation Index (ADI) is an up to date data source on core deprivation metrics in the cities, towns and regions of England.
Presented by Autonomy in conjunction with academics. Working paper for data release can be found here.
The ADI is constructed using granular, high-frequency indicators in 3 domains:
mental health (sub-domains of NHS quality outcomes framework)
employment (job claimant count data)
crime (various sub-domains, inc. burglary, vehicle crime, violent crime, etc.)
It’s a summation of these factors which, when combined with data on population per area, allows us to calculate a rate (ADI cases/person) for each LSOA or District.
It’s a cardinal scale, so the level of measurement is numeric and not only a ranking (ordinal), like the Index of Multiple Deprivation (IMD)
ADI vs IMD:
Like IMD, it contains granular data at Lower Super Output Area (LSOA) levels
Monitor indicators more frequently to gain more real-time insights; i.e., annually vs IMD which is calculated approx. every 5 years
Because ADI is cardinal, we can measure change in absolute levels of deprivation over time
We can also measure levels of and changes in inequality
Here, I’m exploring inequality in the ADI and seeing whether it’s useful for insights at “hyper-local” level (and e.g., useful in MRP)
caveat: ADI uses statistical (census, geographic) and not administrative (ward) boundaries, and I haven’t yet mapped LSOAs to wards, which change over time – so no voting insights yet.
## # A tibble: 6 × 12
## area_code area_name pop cases_claims rate_claims cases_crime rate_crime
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 E01000001 City of Londo… 1102 70 0.0635 234 0.212
## 2 E01000002 City of Londo… 1074 110 0.102 353 0.329
## 3 E01000003 City of Londo… 985 345 0.350 69 0.0701
## 4 E01000005 City of Londo… 693 465 0.671 596 0.860
## 5 E01000006 Barking and D… 1410 555 0.394 105 0.0745
## 6 E01000007 Barking and D… 1114 1230 1.10 627 0.563
## # ℹ 5 more variables: cases_health <dbl>, rate_health <dbl>,
## # cases_ADI_LSOA <dbl>, rate_ADI_LSOA <dbl>, year <fct>
## # A tibble: 6 × 11
## district pop cases_ADI rate_ADI cases_claims rate_claims cases_crime
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 City of London 5342 9435 1.77 1260 0.236 7916
## 2 Barking and Da… 142423 110809 0.778 77310 0.543 23406
## 3 Barnet 290457 122196 0.421 72830 0.251 33483
## 4 Bexley 188214 77105 0.410 49005 0.260 17928
## 5 Bromley 255478 103496 0.405 57535 0.225 27554
## 6 Brent 252001 161200 0.640 109725 0.435 34383
## # ℹ 4 more variables: rate_crime <dbl>, cases_health <dbl>, rate_health <dbl>,
## # year <fct>
There are different ways we can measure inequality; most often we use income. One option is to use data on individual income levels (can use wages, total assets, etc.) and compare those at different ends of the distribution. Use this to give some background on the approach taken later.
Fig. below shows median income levels by income quintile; i.e., rank all individuals from lowest to highest income and partition them into five equal-sized groups. We can eyeball the differences between groups and get broad insights: e.g., median income of the top quintile is over 4x higher than the bottom.
d_UK_quint_long <- d_UK_quint |>
filter(Year == "2021/22") |>
mutate(
across(-Year, ~ as.numeric(.x) / 1000)
) |>
pivot_longer(cols = -Year, names_to = "quintile", values_to = "median_income") |>
select(-Year)
d_UK_quint_long$quintile <- fct_relevel(d_UK_quint_long$quintile, "Bottom", "2nd", "3rd", "4th", "Top")
f_UK_median <- ggplot(
data = d_UK_quint_long |> filter(quintile != "All.Individuals"),
aes(x = quintile, y = median_income, group = 1)
) +
geom_point() +
geom_line() +
geom_hline(
yintercept = (d_UK_quint_long |> filter(quintile == "All.Individuals"))$median_income,
col = lcColor, size = 1, linetype = "dotted"
) +
theme_ipsum() +
scale_y_continuous(name = "median income (£000s)", breaks = seq(0, 100, 10)) +
coord_cartesian(ylim = c(0, 80)) +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
) +
ggtitle("UK median income by income quartile, 2021/22")
f_UK_median
A popular alternative is to draw a Lorenz curve and calculate a Gini coefficient from the underlying data.
The Lorenz curve plots the cumulative proportion of total income by the cumulative proportion of the population, with the latter ranked from lowest to highest (often grouped, here into percentiles).
The figure usually includes a diagonal line of perfect equality, representing the case where each population group earns their equivalent proportion of total income.
The curve tells us the proportion of income that is earned by the poorest X% of the population. The graph below shows us that the poorest half (50%) of the population earn around 27% of all income.
d_UK_ptile_long <- d_UK_ptile |>
select(starts_with("p"), -population, -palma) |>
pivot_longer(cols = everything(), names_to = "pctile", values_to = "pctile_income_share",
names_prefix = "p", names_transform = list(pctile = as.numeric)) |>
arrange(pctile) |>
mutate(
cum_income_share = cumsum(pctile_income_share)
)
f_UK_pctile <- ggplot(
data = d_UK_ptile_long,
aes(x = pctile, y = cum_income_share, group = 1)
) +
geom_point() +
geom_line() +
geom_abline(intercept = 0, slope = 1, col = lcColor, lwd = 1) +
theme_ipsum() +
scale_y_continuous(name = "cumulative share of income", breaks = seq(0, 100, 10)) +
scale_x_continuous(name = "cumulative proportion of population", breaks = seq(0, 100, 10)) +
coord_cartesian(ylim = c(0, 100), xlim = c(0,100)) +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
) +
ggtitle("UK income Lorenz curve, 2019 data")
f_UK_pctile
# source: https://www.wider.unu.edu/database/world-income-inequality-database-wiid
ggplotly(f_UK_pctile)
The Gini coefficient is a measure of inequality based on this data. It roughly measures the percentage difference between the area under the line of perfect equality and the area under the Lorenz curve.
The intuition is that the further ‘away’ the Lorenz curve is from the line of perfect equality—so, the smaller the area under the curve—the more unequal the income distribution is and the greater the difference will be.
… i.e. the ‘flatter-then-steeper’ the curve, the lower the proportion of the poorest groups.
A higher Gini coefficient thus indicates greater income inequality.
Though widely used, there are several cons with using the curve, e.g., plotting multiple Lorenz curves on the same figure becomes hard to interpret.
But it’s useful to measure changes in inequality over time, as in the figure below. We can see income inequality rising from the 80s until around the time of the financial crisis, and then slowly falling. The figure also broadly hints at the role of public transfers in reducing inequality, with a slight widening in the difference between disposable and gross (after transfers) income.
d_UK_gini_long <- d_UK_gini |>
mutate(across(-year, ~ as.numeric(.x))) |>
pivot_longer(cols = -year, names_to = "income_type", values_to = "gini") |>
mutate(
gini = gini / 100,
year_b = str_sub(year, start = 1, end = 2),
year_e = str_sub(year, start = -2, end = -1),
year = as.numeric(paste0(year_b, year_e)),
year = ifelse(year == "1900", "2000", year)
) |>
arrange(year) |> select(-year_b, -year_e)
f_UK_gini <- ggplot(
data = d_UK_gini_long,
aes(x = year, y = gini, group = income_type, color = income_type)
) +
geom_line() +
theme_ipsum() +
scale_y_continuous(name = "Gini coefficient", breaks = seq(0, 1, 0.1)) +
scale_x_discrete(name = "year", breaks = seq(0, 3000, 10)) +
scale_color_discrete(name = "Income type") +
coord_cartesian(ylim = c(0.1, 0.6)) +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
# legend.position = c(0.9, 0.2),
panel.grid.minor.x = element_blank()) +
guides(color = guide_legend(reverse = TRUE)) +
ggtitle(label = "Gini coefficient for UK income")
f_UK_gini
# source: https://www.ons.gov.uk/peoplepopulationandcommunity/personalandhouseholdfinances/incomeandwealth/datasets/householddisposableincomeandinequality
We must be careful though, as the Gini coefficient alone may not be very sensitive. The figure below shows the Gini coefficient on disposable income from the past three years, with 95% confidence intervals to communicate uncertainty in UK-level estimates of mean and median income. There’s a significant overlap, so conclusions about changes must be tempered!
f_UK_gini_recent <- ggplot(
data = d_UK_gini_recent,
aes(x = year, y = gini_disposable_income)
) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = lower_bound, ymax = upper_bound), width = 0.5) +
theme_ipsum() +
scale_y_continuous(name = "Gini coefficient", breaks = seq(0, 1, 0.01)) +
coord_cartesian(ylim = c(0.32, 0.38)) +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
legend.position = c(0.9, 0.25)
) +
ggtitle("Gini coefficient in UK for disposable income")
f_UK_gini_recent
# source: Figure 1 @ https://www.ons.gov.uk/peoplepopulationandcommunity/personalandhouseholdfinances/incomeandwealth/bulletins/householdincomeinequalityfinancial/financialyearending2022
# ---------- ADI Gini preparation
n_bins <- 100
d_gini_LSOA <- d_LSOA |>
arrange(year) |>
group_by(year) |>
arrange(desc(rate_ADI_LSOA)) |> #
mutate(
# x-axis: rank LSOAs by ADI rate (i.e., LSOAs are normalised by person), get cumulative number and proportion of population
pop_TOTAL = sum(pop, na.rm = TRUE),
pop_LSOA_cumsum = cumsum(pop),
pop_LSOA_cumprop = pop_LSOA_cumsum / pop_TOTAL,
# percentiles for each of interpretation (but less accuracy) and Gini calculation (requires equally weighted bins)
pop_bins = cut_interval(pop_LSOA_cumsum, n = n_bins, labels = FALSE), # Break into bins
pop_bins_p = pop_bins / n_bins,
## OR: cut(pop_cumsum, breaks = 100, labels = FALSE)
cases_ADI_TOTAL = sum(cases_ADI_LSOA, na.rm = TRUE),
cases_ADI_prop = cases_ADI_LSOA / cases_ADI_TOTAL,
cases_ADI_cumprop = cumsum(cases_ADI_prop)
) |>
ungroup()
# ---------- Lorenz curve calculations
d_gini_bins <- d_gini_LSOA |>
group_by(year, pop_bins, pop_bins_p) |>
summarise(
pop_bins_sum = sum(pop, na.rm = TRUE),
cases_ADI_bins = sum(cases_ADI_LSOA, na.rm = TRUE),
cases_claims_bins = sum(cases_claims, na.rm = TRUE),
cases_crime_bins = sum(cases_crime, na.rm = TRUE),
cases_health_bins = sum(cases_health, na.rm = TRUE)
) |>
ungroup() |>
group_by(year) |>
mutate( # cumulative sum of cases by bins, for plotting Lc
cases_ADI_bins_cumsum = cumsum(cases_ADI_bins),
cases_ADI_bins_cumsum_norm = cases_ADI_bins_cumsum / max(cases_ADI_bins_cumsum),
cases_claims_bins_cumsum = cumsum(cases_claims_bins),
cases_claims_bins_cumsum_norm = cases_claims_bins_cumsum / max(cases_claims_bins_cumsum),
cases_crime_bins_cumsum = cumsum(cases_crime_bins),
cases_crime_bins_cumsum_norm = cases_crime_bins_cumsum / max(cases_crime_bins_cumsum),
cases_health_bins_cumsum = cumsum(cases_health_bins),
cases_health_bins_cumsum_norm = cases_health_bins_cumsum / max(cases_health_bins_cumsum),
) |>
ungroup()
## `summarise()` has grouped output by 'year', 'pop_bins'. You can override using
## the `.groups` argument.
d_gini_bins_long <- d_gini_bins |>
select(year, pop_bins_p, cases_ADI_bins_cumsum_norm, cases_claims_bins_cumsum_norm, cases_crime_bins_cumsum_norm, cases_health_bins_cumsum_norm) |>
rename_with(cols = everything(), ~ str_remove(.x, "_bins_cumsum_norm")) |>
pivot_longer(cols = starts_with("cases_"), names_to = "Type", values_to = "Lc", names_prefix = "cases_")
## ----- Plot Lorenz curve
range_years <- unique(d_gini_bins_long$year) #>> select range of years
range_years
## [1] 2013 2014 2015 2016 2017 2018 2019 2020 2021
## Levels: 2013 2014 2015 2016 2017 2018 2019 2020 2021
# Lc: ADI by category, most recent year
most_recent_year <- c("2021") # >> year select
# add zero row
new_row <- tibble(year = "2021", pop_bins_p = 0, Type = "ADI", Lc = 0.00)
# new_row <- data.frame(lapply(d_gini_bins_long, function(x) 0))
d_gini_bins_long_recent <- rbind(
new_row,
d_gini_bins_long |> filter(year %in% most_recent_year)
) |> select(-year)
f_lc_recent <- ggplot(
data = d_gini_bins_long_recent |> filter(Type == "ADI"),
aes(y = Lc, x = pop_bins_p, group = 1)) +
geom_line(color = mainColor) +
geom_point(color = mainColor) +
geom_abline(intercept = 0, slope = 1, col = lcColor, lwd = 1) +
scale_y_continuous(name = "cumulative normalised ADI cases", breaks = seq(0, 1, 0.1)) +
scale_x_continuous(name = "cumulative normalised rank of ADI cases", breaks = seq(0, 1, 0.1)) +
coord_cartesian(ylim = c(0,1), xlim = c(0,1)) +
theme_ipsum() +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
legend.position = c(0.8, 0.2)
) +
ggtitle(paste0("Lorenz curve of ADI cases in ", most_recent_year, ", by category"))
f_lc_recent
# >> plotly here
f_lc_recent_cat <- ggplot(
data = d_gini_bins_long_recent |> filter(Type == "ADI"),
aes(y = Lc, x = pop_bins_p, group = 1)) +
geom_line(color = mainColor) +
geom_point(color = mainColor) +
geom_line(data = d_gini_bins_long_recent |> filter(Type != "ADI"),
aes(group = Type, color = Type),
linetype = 3, size = 2) +
geom_abline(intercept = 0, slope = 1, col = lcColor, lwd = 1) +
scale_y_continuous(name = "cumulative normalised ADI cases", breaks = seq(0, 1, 0.1)) +
scale_x_continuous(name = "cumulative normalised rank of ADI cases", breaks = seq(0, 1, 0.1)) +
coord_cartesian(ylim = c(0,1), xlim = c(0,1)) +
theme_ipsum() +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
legend.position = c(0.8, 0.2)
) +
ggtitle(paste0("Lorenz curve of ADI cases in ", most_recent_year, ", by category"))
f_lc_recent_cat
# \\ health inequality is low; note that it's ranked by ADI rate (rather than health rate)
# # Lc: ADI by year
# f_lc_all <- ggplot(
# data = d_gini_bins_long |>
# filter(
# Type == "ADI",
# year %in% range_years
# ),
# aes(y = Lc, x = pop_bins_p, group = year, color = year)) +
# geom_line() +
# geom_point() +
# geom_abline(intercept = 0, slope = 1, col = lcColor, lwd = 1) +
# scale_y_continuous(name = "cumulative normalised ADI cases", breaks = seq(0, 1, 0.1)) +
# scale_x_continuous(name = "cumulative normalised rank of ADI cases", breaks = seq(0, 1, 0.1)) +
# coord_cartesian(ylim = c(0,1), xlim = c(0,1)) +
# theme_ipsum() +
# theme(
# axis.title.y = element_text(color = mainColor, size = 20),
# axis.title.x = element_text(color = yearColor, size = 20),
# legend.position = c(0.9, 0.25)
# ) +
# ggtitle("Lorenz curve of ADI cases, by year")
# f_lc_all
# ---------- Gini coefficient (by year) calculations
d_gini_bins_summary <-
d_gini_bins |>
arrange(year, pop_bins) |>
group_by(year) |>
summarise(
gini_ADI = ineq::Gini(cases_ADI_bins), # so this is per person within LSOA, but it doesn't weigh how different LSOAs are wrt to total contribution to all rates, which I think must happen
gini_claims = ineq::Gini(cases_claims_bins),
gini_crime = ineq::Gini(cases_crime_bins),
gini_health = ineq::Gini(cases_health_bins)
) |>
ungroup()
d_gini_bins_summary_long <- d_gini_bins_summary |>
pivot_longer(cols = c(-year), names_to = "Type", values_to = "gini", names_prefix = "gini_")
# ---------- plot Gini coefficient over time
f_gini_year <- ggplot(
data = d_gini_bins_summary_long |> filter(Type == "ADI"),
aes(y = gini, x = year, group = 1)) +
geom_line(size = 2, color = mainColor) +
coord_cartesian(ylim = c(0,0.6)) +
scale_y_continuous(name = "Gini coefficient") +
theme_ipsum() +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20)
) +
ggtitle("Gini coefficient of ADI cases, by year")
f_gini_year
## ... and by category >>> add in category to click
f_gini_year_cat <- ggplot(
data = d_gini_bins_summary_long |> filter(Type == "ADI"),
aes(y = gini, x = year, group = 1)) +
geom_line(size = 2, color = mainColor) +
geom_line(data = d_gini_bins_summary_long |> filter(Type != "ADI"),
aes(group = Type, color = Type),
linetype = 3, size = 2) +
coord_cartesian(ylim = c(0,0.6)) +
scale_y_continuous(name = "Gini coefficient") +
theme_ipsum() +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
legend.position = c(0.9, 0.8)
) +
ggtitle("Gini coefficient of ADI cases by category, by year")
f_gini_year_cat
# >>> can add graph selection option to look at difference between selected points
Want the measure to be both valid and to offer some additional variation and insight
g_adi <- d_gini_bins_summary_long |> filter(Type == "ADI")
g_inc <- d_UK_gini_long |> mutate(year = as.numeric(year)) |>
filter(year >= 2013, income_type == "disposable") |>
rename(Type = income_type) |> slice(1:(n()-1)) |>
mutate(Type = ifelse(Type == "disposable", "Income", Type))
d_gini_joint <- rbind(g_adi, g_inc) |> arrange(year)
f_gini_joint <- ggplot(
data = d_gini_joint,
aes(y = gini, x = year, group = Type, color = Type)) +
geom_line(size = 2) +
coord_cartesian(ylim = c(0.2,0.4)) +
scale_y_continuous(name = "Gini coefficient") +
theme_ipsum() +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
panel.grid.minor.y = element_blank(),
legend.position = c(0.2, 0.2)
) +
ggtitle("Gini coefficient of UK Income vs ADI cases, over time")
f_gini_joint
# d_gini_joint_diff <- d_gini_joint |>
# pivot_wider(names_from = Type, values_from = gini) |>
# mutate(diff = ADI - Income)
#
# f_gini_joint_diff <- ggplot(
# data = d_gini_joint_diff,
# aes(y = diff, x = year)) +
# # geom_col() +
# geom_point(size = 2) +
# geom_segment(aes(xend = year, y = 0, yend = diff)) +
# coord_cartesian(ylim = c(-0.075, 0.025)) +
# scale_y_continuous(name = "Gini ADI minus Income") +
# theme_ipsum() +
# theme(
# axis.title.y = element_text(color = mainColor, size = 20),
# axis.title.x = element_blank(),
# axis.line.x = element_blank(),
# axis.text.x = element_blank(),
# legend.position = "none"
# )
# f_gini_joint_diff
Side-by-side graphs
d_year_totals <- d_gini_bins |>
# filter(cases_group != 0) |>
group_by(year) |>
summarise(
ADI = max(cases_ADI_bins_cumsum),
# ADI_sum = sum(cases_ADI_bins, na.rm = TRUE), # should give same answer
Claims = max(cases_claims_bins_cumsum),
Crime = max(cases_crime_bins_cumsum),
Health = max(cases_health_bins_cumsum)
) |>
mutate(across(-year, ~ .x /1000000))
# d_year_totals_check <- d_year_totals |> mutate(cases_check = (Claims + Crime + Health ) - ADI)
###
d_year_totals_pivot <- d_year_totals |>
pivot_longer(cols = -year, names_to = "Type", values_to = "N")
# ADI total and by category, over time
f_cases_year <- ggplot(
data = d_year_totals_pivot |> filter(Type == "ADI"),
aes(y = N, x = year, group = 1)) +
geom_line(size = 2, color = mainColor) +
geom_line(data = d_year_totals_pivot |> filter(Type != "ADI"), # Can use d_abs_pivot to include ADI as well
aes(y = N, x = year, group = Type, color = Type),
linetype = 3, size = 2) +
scale_y_continuous(name = "cases, in millions") +
scale_x_discrete(name = "year") +
coord_cartesian(ylim = c(0, 40)) +
theme_ipsum() +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
legend.position = c(0.12,0.8)
) +
ggtitle("ADI total cases by year, and by category")
f_cases_year
# Proportions shown in stacked graph
f_cases_stack <- ggplot(
data = d_year_totals_pivot |> filter(Type != "ADI"),
aes(y = N, x = year, fill = Type)) +
geom_area(
aes(x = as.numeric(year)),
alpha = 0.4
) +
geom_line(aes(group = Type, color = Type), linetype = 3, size = 2) +
# geom_line(data = d_year_totals_pivot |> filter(Type == "ADI"),
# aes(x = year, group = 1, fill = NULL, color = NULL), color = mainColor, size = 2) +
scale_x_discrete(name = "year") +
scale_y_continuous(name = "cases, in millions") +
coord_cartesian(ylim = c(0, 40)) +
theme_ipsum() +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
legend.position = c(0.12,0.8)
) +
ggtitle("ADI total cases by year, and by category (shaded = proportion)")
#geom_line( color = mainColor), size = 2)
f_cases_stack
# https://r-graph-gallery.com/stacked-area-graph.html (see also for plotly and dygaph interactive code)
# ==================== plot district changes over two periods ==========
# hard to look directly at specific LSOA's (indicated by 4-digit code), as some are missing years of data and might be larger variation
# rather, look at districts and then dive into those experiencing more decline
d_district <- d_DIST
glimpse(d_district)
## Rows: 2,934
## Columns: 11
## $ district <chr> "City of London", "Barking and Dagenham", "Barnet", "Bexl…
## $ pop <dbl> 5342, 142423, 290457, 188214, 255478, 252001, 191813, 291…
## $ cases_ADI <dbl> 9435, 110809, 122196, 77105, 103496, 161200, 107556, 1648…
## $ rate_ADI <dbl> 1.7661924, 0.7780274, 0.4207025, 0.4096667, 0.4051073, 0.…
## $ cases_claims <dbl> 1260, 77310, 72830, 49005, 57535, 109725, 54315, 106850, …
## $ rate_claims <dbl> 0.2358667, 0.5428196, 0.2507428, 0.2603685, 0.2252053, 0.…
## $ cases_crime <dbl> 7916, 23406, 33483, 17928, 27554, 34383, 42108, 40726, 38…
## $ rate_crime <dbl> 1.48184201, 0.16434143, 0.11527696, 0.09525328, 0.1078527…
## $ cases_health <dbl> 259, 10093, 15883, 10172, 18407, 17092, 11133, 17232, 168…
## $ rate_health <dbl> 0.04848371, 0.07086636, 0.05468279, 0.05404486, 0.0720492…
## $ year <fct> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 201…
# Looking at individual district progress over time, ideas:
# > look at position changes, either by rank or pop bin change each year
# > likelihood of moving up or down
## ----------- average district-level ADI rate over time
fun_type <- "mean"
f_rate_by_year_dist <-
ggplot(
data = d_district,
aes(x = year, y = rate_ADI)
) +
geom_violin() +
stat_summary(
geom = "point",
fun = fun_type,
color = mainColor, size = 2
) +
stat_summary(
group = 1,
geom = "line",
fun = fun_type,
color = mainColor, size = 1, linetype = 3
) +
coord_cartesian(ylim = c(0, 2)) +
scale_y_continuous(name = "average ADI rate") +
scale_x_discrete(name = "year") +
theme_ipsum() +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
legend.position = "right",
panel.grid.major.x = element_blank()
) +
guides(color = guide_legend(reverse = TRUE)) +
ggtitle("distribution of district-level ADI rate, by year")
f_rate_by_year_dist
# create deciles using previous bins
d_gini_deciles_base <- d_gini_bins |>
arrange(year, pop_bins) |>
group_by(year) |>
mutate(
pop_deciles = as.factor(cut_interval(pop_bins, n = 10, labels = FALSE)),
) |>
ungroup()
# print(d_gini_deciles |> count(year, pop_deciles), n = 200)
d_gini_deciles <- d_gini_deciles_base |>
group_by(year, pop_deciles) |>
summarise(
pop_decile = sum(pop_bins_sum),
cases_ADI_decile = sum(cases_ADI_bins, na.rm = TRUE), # deciles are equal sized, so no need to weight according to pop size
rate_ADI_decile = cases_ADI_decile / pop_decile,
cases_claims_decile = sum(cases_claims_bins, na.rm = TRUE),
rate_claims_decile = cases_claims_decile / pop_decile,
cases_crime_decile = sum(cases_crime_bins, na.rm = TRUE),
rate_crime_decile = cases_crime_decile / pop_decile,
cases_health_decile = sum(cases_health_bins, na.rm = TRUE),
rate_health_decile = cases_health_decile / pop_decile
)
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
## ---------- plot rates by decile over time
f_area_rate_by_year <- ggplot(
data = d_gini_deciles,
aes(y = rate_ADI_decile, x = year, group = pop_deciles, color = pop_deciles)
) +
geom_line(size = 2) +
scale_y_continuous(name = "ADI rate (average cases/person)") +
scale_x_discrete(name = "years") +
coord_cartesian(ylim = c(0, 2)) +
theme_ipsum() +
labs(color = "ADI rate decile\n(1=most deprived)") +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
legend.position = "right"
) +
ggtitle("ADI rate by LSOA decile rank, by year")
f_area_rate_by_year
# rate changes by band
year_input_1 <- "2020"
year_input_2 <- "2021"
# d_prog_AREA_band_sum_change <- d_prog_AREA_band_sum |>
# filter(year %in% c(year_1, year_2)) |>
# pivot_wider(names_from = year, values_from = decile_rate_mean, names_prefix = "rate_") |>
# mutate(rate_change = rate_2021 - rate_2020)
# rate change by decile
d_gini_deciles_change <- d_gini_deciles |>
filter(year == year_input_1 | year == year_input_2) |>
select(year, pop_deciles, starts_with("rate_")) |>
rename_with(cols = everything(), ~ str_remove_all(.x, "rate_|_decile")) |>
arrange(pops, year) |>
group_by(pops) |>
mutate(
change_ADI = ADI - lag(ADI),
change_claims = ADI - lag(claims),
change_crime = ADI - lag(crime),
change_health = ADI - lag(health),
change_color = as.factor(ifelse(change_ADI >= 0, "pos", "neg"))
) |>
ungroup()
d_gini_deciles_change_base <- d_gini_deciles_change |>
select(pops, change_color, starts_with("change_")) |>
filter(!is.na(change_ADI))
# graph rate changes by decile
f_rate_change_base <-
ggplot(
data = d_gini_deciles_change_base,
aes(x = pops, y = change_ADI, group = change_color, color = change_color)
) +
geom_point(size = 3) +
geom_segment(aes(xend = pops, y = 0, yend = change_ADI), size = 1.5) +
# geom_errorbar() +
geom_hline(yintercept = 0, size = 0.25, color = "grey") +
coord_cartesian(ylim = c(-0.05, 0.15)) +
scale_y_continuous(name = "change in ADI rate") +
scale_x_discrete(name = "LSOA ADI rate decile rank (1=most deprived)") +
scale_color_manual(values=c(lcColor, yearColor)) +
theme_ipsum() +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
legend.position = "none"
) +
ggtitle(paste0("change in ADI rate between ", year_input_1," and ", year_input_2,", by LSOA ADI decile rank"))
f_rate_change_base
But matters what the previous level is
# graph rate changes by decile, from previous year level
d_gini_deciles_change_orig <- d_gini_deciles_change |>
select(pops, year, change_color, ADI, change_ADI) |>
arrange(pops, year) |>
group_by(pops) |>
mutate(
ADI_1 = lag(ADI),
ADI_2 = ADI
) |>
ungroup() |>
filter(!is.na(change_ADI)) |>
select(-year)
f_rate_change_orig <-
ggplot(
data = d_gini_deciles_change_orig,
aes(x = pops, y = ADI_1, group = change_color, color = change_color)
) +
geom_point(size = 1.5) +
geom_point(aes(y=ADI_2), size = 1.5) +
geom_segment(
aes(xend = pops, y = ADI_1, yend = ADI_2), size = 1.5,
) +
geom_segment(
aes(xend = pops, y = ADI_1, yend = ADI_2),
arrow = arrow(length = unit(0.015, "npc")),
position = position_nudge(x = -0.3)
) +
# geom_errorbar() +
# geom_hline(yintercept = 0, size = 0.25, color = "grey") +
# coord_cartesian(ylim = c(-0.05, 0.15)) +
scale_y_continuous(name = "change in ADI rate") +
scale_x_discrete(name = "LSOA ADI rate decile rank (1=most deprived)") +
scale_color_manual(values=c(lcColor, yearColor)) +
theme_ipsum() +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
legend.position = "none"
) +
ggtitle(paste0("change in ADI rate between ", year_input_1," and ", year_input_2,", by LSOA ADI decile rank"))
f_rate_change_orig
year_input_district_1 <- "2020"
year_input_district_2 <- "2021"
d_district_lag <- d_district |>
arrange(district, year) |>
group_by(district) |>
mutate (
rate_change = round(rate_ADI - lag(rate_ADI), digits = 5),
) |>
ungroup()
d_district_lag_select<- d_district_lag |>
filter(year %in% c(year_input_district_1, year_input_district_2))
# Is there a natural cutoff?
d_select_cutoff <- d_district_lag_select |>
filter(year == year_input_district_2) |>
arrange(desc(rate_change)) |>
mutate(
index = row_number(),
qlnorm = qlnorm(rate_change+0.1) - 0.1
)
plot(d_select_cutoff$qlnorm)
f_ADI_cutoff <- ggplot(
data = d_select_cutoff,
aes(x= index, y=rate_change, text = district)
) +
geom_point() +
geom_hline(yintercept = 0, size = 0.5, color = "grey") +
scale_y_continuous(name = "ADI rate change", breaks = seq(-0.2,0.3, 0.05)) +
scale_x_continuous(name = "District") +
coord_cartesian(ylim = c(-0.1, 0.25)) +
theme_ipsum() +
theme(
axis.text.x = element_blank(),
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
)
f_ADI_cutoff
# ggplotly(f_ADI_cutoff, tooltip = c("area_name_sub"))
# Table of performers
d_district_select <- d_district_lag_select |>
filter(year == year_input_district_2) |>
select(district, pop, rate_change, cases_ADI, rate_ADI, everything(), -year)
d_district_select |>
arrange(desc(rate_change)) |>
gt() |>
tab_header(title = "District-level ADI info and sub-categories") |>
tab_options(
ihtml.active = TRUE,
ihtml.use_pagination = TRUE,
ihtml.use_highlight = TRUE,
ihtml.page_size_default = 20
# ihtml.use_filters = TRUE,
# ihtml.use_search = TRUE
) |>
fmt_number(
columns = starts_with("rate_"),
) |>
fmt_number(
columns = c(pop, starts_with("cases")),
decimals = 0
) |>
data_color(columns = rate_change,
colors = "#e66000"
) |>
cols_label(
district = "District",
pop = "Population",
rate_change = "ADI rate change",
cases_ADI = "ADI cases",
rate_ADI = "ADI rate",
cases_claims = "Claims cases",
rate_claims = "Claims rate",
cases_crime = "Crime cases",
rate_crime = "Crime rate",
cases_health = "Health cases",
rate_health = "Health rate",
)
## Warning: Since gt v0.9.0, the `colors` argument has been deprecated.
## • Please use the `palette` argument to define a color palette.
## This warning is displayed once every 8 hours.
# Get worst and best performers
n_slice <- 20
d_select_worst <- d_district_lag_select |>
filter(year == year_input_district_2) |>
arrange(desc(rate_change)) |>
slice_head(n=n_slice)
d_select_best <- d_district_lag_select |>
filter(year == year_input_district_2) |>
arrange(rate_change) |>
slice_head(n=n_slice)
d_district_lag_worst <- d_district_lag_select |>
filter(district %in% d_select_worst$district)
d_district_lag_best <- d_district_lag_select |>
filter(district %in% d_select_best$district)
d_district_lag_main <- d_district_lag_select |>
filter(!(district %in% d_select_worst$district) & !(district %in% d_select_best$district))
f_rate_worst <-
ggplot(
data = d_district_lag_main,
aes(x = year, y = rate_ADI, label = district)
) +
geom_point(color = "grey", alpha = 0.3, position = position_jitter()) +
geom_boxplot(alpha = 0.2) +
# biggest rate increases (worst performers)
geom_point(data = d_district_lag_worst, color = lcColor) +
geom_line(data = d_district_lag_worst, aes(group = district), color = lcColor) +
coord_cartesian(ylim = c(0.25, 1.5)) +
scale_y_continuous(name = " average ADI rate", breaks = seq(0, 2, 0.25)) +
scale_x_discrete(name = "LSOA ADI rate decile rank (1=most deprived)") +
scale_color_manual(values=c(lcColor, yearColor)) +
theme_ipsum() +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
# legend.position = "none"
) +
ggtitle(paste0("worst-performing districts (ADI rate between ", year_input_1," and ", year_input_2, ")"))
f_rate_worst
# ggplotly(f_rate_worst)
f_rate_best <-
ggplot(
data = d_district_lag_main,
aes(x = year, y = rate_ADI, label = district)
) +
geom_point(color = "grey", alpha = 0.3, position = position_jitter()) +
geom_boxplot(alpha = 0.2) +
# biggest rate decreases (best performers)
geom_point(data = d_district_lag_best, color = mainColor) +
geom_line(data = d_district_lag_best, aes(group = district), color = mainColor) +
coord_cartesian(ylim = c(0.25, 1.5)) +
scale_y_continuous(name = " average ADI rate", breaks = seq(0, 2, 0.25)) +
scale_x_discrete(name = "LSOA ADI rate decile rank (1=most deprived)") +
scale_color_manual(values=c(lcColor, yearColor)) +
theme_ipsum() +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
# legend.position = "none"
) +
ggtitle(paste0("best-performing districts (ADI rate between ", year_input_1," and ", year_input_2, ")"))
f_rate_best
# ggplotly(f_rate_best)
f_rate_both <-
ggplot(
data = d_district_lag_main,
aes(x = year, y = rate_ADI, label = district)
) +
geom_point(color = "grey", alpha = 0.3, position = position_jitter()) +
geom_boxplot(alpha = 0.2) +
# biggest rate decreases (best performers)
geom_point(data = d_district_lag_best, color = mainColor) +
geom_line(data = d_district_lag_best, aes(group = district), color = mainColor) +
# biggest rate increases (worst performers)
geom_point(data = d_district_lag_worst, color = lcColor) +
geom_line(data = d_district_lag_worst, aes(group = district), color = lcColor) +
coord_cartesian(ylim = c(0.25, 1.5)) +
scale_y_continuous(name = " average ADI rate", breaks = seq(0, 2, 0.25)) +
scale_x_discrete(name = "LSOA ADI rate decile rank (1=most deprived)") +
scale_color_manual(values=c(lcColor, yearColor)) +
theme_ipsum() +
theme(
axis.title.y = element_text(color = mainColor, size = 20),
axis.title.x = element_text(color = yearColor, size = 20),
# legend.position = "none"
) +
ggtitle(paste0("district-level ADI rate between ", year_input_1," and ", year_input_2))
f_rate_both
# ## print tables
#
# d_worst_table <- d_district_lag_worst |>
# select(area_name_sub, year, rate_ADI_AREA) |>
# rename(district = area_name_sub, rate = rate_ADI_AREA) |>
# mutate(rate = round(rate, digits=2)) |>
# group_by(district) |>
# pivot_wider(names_from = year, values_from = rate, names_prefix = "rate_") |>
# mutate(rate_change = rate_2021 - rate_2020) |>
# ungroup() |>
# arrange(desc(rate_change))
# d_worst_table
#
# d_best_table <- d_district_lag_best |>
# select(area_name_sub, year, rate_ADI_AREA) |>
# rename(district = area_name_sub, rate = rate_ADI_AREA) |>
# mutate(rate = round(rate, digits=2)) |>
# group_by(district) |>
# pivot_wider(names_from = year, values_from = rate, names_prefix = "rate_") |>
# mutate(rate_change = rate_2021 - rate_2020) |>
# ungroup() |>
# arrange(rate_change)
# d_best_table
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.